22 research outputs found

    Statistical power analysis for single-cell RNA-sequencing

    Get PDF
    RNA-sequencing (RNA-seq) is an established method to quantify levels of gene expression genome-wide. The recent development of single cell RNA sequencing (scRNA-seq) protocols opens up the possibility to systematically characterize cell transcriptomes and their underlying developmental and regulatory mechanisms. Since the first publication on single-cell transcriptomics a decade ago, hundreds of scRNA-seq datasets from a variety of sources have been released, profiling gene expression of sorted cells, tumors, whole dissociated organs and even complete organisms. Currently, it is also the main tool to systematically characterize human cells within the Human Cell Atlas Project. Given its wide applicability and increasing popularity, many experimental protocols and computational analysis approaches exist for scRNA-seq. However, the technology remains experimentally and computationally challenging. Firstly, single cells contain only minute mRNA amounts that need to be reliably captured and amplified for accurate quantification by sequencing. Importantly, the Polymerase Chain Reaction (PCR) is commonly used for amplification which might introduce biases and increase technical variation. Secondly, once the sequencing results are obtained, finding the best computational processing pipeline can be a struggle. A number of comparison studies have already been conducted - esp. for bulk RNA-seq - but usually they deal only with one aspect of the workflow. Furthermore, in how far the conclusions and recommendations of these studies can be transferred to scRNA-seq is unknown. Related to the processing of RNA-sequencing, we investigate the effect of PCR amplification on differential expression analysis. We find that computational removal of duplicates has either a negligible or a negative impact on specificity and sensitivity of differential expression analysis, and we therefore recommend not to remove read duplicates by mapping position. In contrast, if duplicates are identified using unique molecular identifiers (UMIs) tagging RNA molecules, both specificity and sensitivity improve. The first integral step of any scRNA-seq experiment is the preparation of sequencing libraries from the cells. We conducted an independent benchmarking study of popular library preparation protocols in terms of detection sensitivity, accuracy and precision using the same mouse embryonic stem cells and exogenous mRNA spike-ins. We recapitulate our previous finding that technical variance is markedly decreased when using UMIs to remove duplicates. In order to assign a monetary value to the detected amounts of technical variance, we developed a simulation framework, that enabled us to compare the power to detect differentially expressed genes across the scRNA-seq library preparation protocols. Our experiences during this comparison study led to the development of the sequencing data processing in zUMIs and the simulation framework and power analysis in powsimR. zUMIs is a pipeline for processing scRNA-seq data with flexible choices regarding UMI and cell barcode design. In addition, we showed with powsimR simulations that the inclusion of intronic reads for gene expression quantification increases the power to detect DE genes and added it as a unique feature to zUMIs. In powsimR, we present our simulation framework extending choices concerning data analysis, enabling researchers to assess experimental design and analysis plans of RNA-seq in terms of statistical power. Lastly, we conducted a systematic evaluation of scRNA-seq experimental and analytical pipelines. We found that choices made concerning normalisation and library preparation protocols have the biggest impact on the validity of scRNA-seq DE analysis. Choosing a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the cell sample size. Taken together, we have established and applied a simulation framework that allowed us to benchmark experimental and computational scRNA-seq protocols and hence inform the experimental design and method choices of this important technology

    The impact of amplification on differential expression analyses by RNA-seq

    Get PDF
    Currently, quantitative RNA-seq methods are pushed to work with increasingly small starting amounts of RNA that require amplification. However, it is unclear how much noise or bias amplification introduces and how this affects precision and accuracy of RNA quantification. To assess the effects of amplification, reads that originated from the same RNA molecule (PCR-duplicates) need to be identified. Computationally, read duplicates are defined by their mapping position, which does not distinguish PCR-from natural duplicates and hence it is unclear how to treat duplicated reads. Here, we generate and analyse RNA-seq data sets prepared using three different protocols (Smart-Seq, TruSeq and UMI-seq). We find that a large fraction of computationally identified read duplicates are not PCR duplicates and can be explained by sampling and fragmentation bias. Consequently, the computational removal of duplicates does improve neither accuracy nor precision and can actually worsen the power and the False Discovery Rate (FDR) for differential gene expression. Even when duplicates are experimentally identified by unique molecular identifiers (UMIs), power and FDR are only mildly improved. However, the pooling of samples as made possible by the early barcoding of the UMI-protocol leads to an appreciable increase in the power to detect differentially expressed genes

    A systematic evaluation of single cell RNA-seq analysis pipelines

    Get PDF
    The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not yet been established. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in similar to 3000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size

    zUMIs - A fast and flexible pipeline to process RNA sequencing data with UMIs

    Get PDF
    Background: Single-cell RNA-sequencing (scRNA-seq) experiments typically analyze hundreds or thousands of cells after amplification of the cDNA. The high throughput is made possible by the early introduction of sample-specific bar codes (BCs), and the amplification bias is alleviated by unique molecular identifiers (UMIs). Thus, the ideal analysis pipeline for scRNA-seq data needs to efficiently tabulate reads according to both BC and UMI. Findings: zUMIs is a pipeline that can handle both known and random BCs and also efficiently collapse UMIs, either just for exon mapping reads or for both exon and intron mapping reads. If BC annotation is missing, zUMIs can accurately detect intact cells from the distribution of sequencing reads. Another unique feature of zUMIs is the adaptive downsampling function that facilitates dealing with hugely varying library sizes but also allows the user to evaluate whether the library has been sequenced to saturation. To illustrate the utility of zUMIs, we analyzed a single-nucleus RNA-seq dataset and show that more than 35% of all reads map to introns. Also, we show that these intronic reads are informative about expression levels, significantly increasing the number of detected genes and improving the cluster resolution. Conclusions: zUMIs flexibility makes if possible to accommodate data generated with any of the major scRNA-seq protocols that use BCs and UMIs and is the most feature-rich, fast, and user-friendly pipeline to process such scRNA-seq data

    Primate iPS cells as tools for evolutionary analyses

    Get PDF
    Induced pluripotent stem cells (iPSCs) are regarded as a central tool to understand human biology in health and disease. Similarly, iPSCs from non-human primates should be a central tool to understand human evolution, in particular for assessing the conservation of regulatory networks in iPSC models. Here, we have generated human, gorilla, bonobo and cynomolgus monkey iPSCs and assess their usefulness in such a framework. We show that these cells are well comparable in their differentiation potential and are generally similar to human, cynomolgus and rhesus monkey embryonic stem cells (ESCs). RNA sequencing reveals that expression differences among clones, individuals and stem cell type are all of very similar magnitude within a species. In contrast, expression differences between closely related primate species are three times larger and most genes show significant expression differences among the analyzed species. However, pseudogenes differ more than twice as much, suggesting that evolution of expression levels in primate stem cells is rapid, but constrained. These patterns in pluripotent stem cells are comparable to those found in other tissues except testis. Hence, primate iPSCs reveal insights into general primate gene expression evolution and should provide a rich source to identify conserved and species-specific gene expression patterns for cellular phenotypes

    Mopeia Virus–related Arenavirus in Natal Multimammate Mice, Morogoro, Tanzania

    Get PDF
    A serosurvey involving 2,520 small mammals from Tanzania identified a hot spot of arenavirus circulation in Morogoro. Molecular screening detected a new arenavirus in Natal multimammate mice (Mastomys natalensis), Morogoro virus, related to Mopeia virus. Only a small percentage of mice carry Morogoro virus, although a large proportion shows specific antibodies

    Sensitive and powerful single-cell RNA sequencing using mcSCRB-seq

    Get PDF
    Single-cell RNA sequencing (scRNA-seq) has emerged as a central genome-wide method to characterize cellular identities and processes. Consequently, improving its sensitivity, flexibility, and cost-efficiency can advance many research questions. Among the flexible platebased methods, single-cell RNA barcoding and sequencing (SCRB-seq) is highly sensitive and efficient. Here, we systematically evaluate experimental conditions of this protocol and find that adding polyethylene glycol considerably increases sensitivity by enhancing cDNA synthesis. Furthermore, using Terra polymerase increases efficiency due to a more even cDNA amplification that requires less sequencing of libraries. We combined these and other improvements to develop a scRNA-seq library protocol we call molecular crowding SCRB-seq (mcSCRB-seq), which we show to be one of the most sensitive, efficient, and flexible scRNA-seq methods to date

    Antimicrobial use in pediatric oncology and hematology in Germany and Austria, 2020/2021: a cross-sectional, multi-center point-prevalence study with a multi-step qualitative adjudication process

    Get PDF
    Background Due to the high risk of severe infection among pediatric hematology and oncology patients, antimicrobial use is particularly high. With our study, we quantitatively and qualitatively evaluated, based on institutional standards and national guidelines, antimicrobial usage by employing a point-prevalence survey with a multi-step, expert panel approach. We analyzed reasons for inappropriate antimicrobial usage. Methods This cross-sectional study was conducted at 30 pediatric hematology and oncology centers in 2020 and 2021. Centers affiliated to the German Society for Pediatric Oncology and Hematology were invited to join, and an existing institutional standard was a prerequisite to participate. We included hematologic/oncologic inpatients under 19 years old, who had a systemic antimicrobial treatment on the day of the point prevalence survey. In addition to a one-day, point-prevalence survey, external experts individually assessed the appropriateness of each therapy. This step was followed by an expert panel adjudication based upon the participating centers’ institutional standards, as well as upon national guidelines. We analyzed antimicrobial prevalence rate, along with the rate of appropriate, inappropriate, and indeterminate antimicrobial therapies with regard to institutional and national guidelines. We compared the results of academic and non-academic centers, and performed a multinomial logistic regression using center- and patient-related data to identify variables that predict inappropriate therapy. Findings At the time of the study, a total of 342 patients were hospitalized at 30 hospitals, of whom 320 were included for the calculation of the antimicrobial prevalence rate. The overall antimicrobial prevalence rate was 44.4% (142/320; range 11.1–78.6%) with a median antimicrobial prevalence rate per center of 44.5% (95% confidence interval [CI] 35.9–49.9). Antimicrobial prevalence rate was significantly higher (p < 0.001) at academic centers (median 50.0%; 95% CI 41.2–55.2) compared to non-academic centers (median 20.0%; 95% CI 11.0–32.4). After expert panel adjudication, 33.8% (48/142) of all therapies were labelled inappropriate based upon institutional standards, with a higher rate (47.9% [68/142]) when national guidelines were taken into consideration. The most frequent reasons for inappropriate therapy were incorrect dosage (26.2% [37/141]) and (de-)escalation/spectrum-related errors (20.6% [29/141]). Multinomial, logistic regression yielded the number of antimicrobial drugs (odds ratio, OR, 3.13, 95% CI 1.76–5.54, p < 0.001), the diagnosis febrile neutropenia (OR 0.18, 95% CI 0.06–0.51, p = 0.0015), and an existing pediatric antimicrobial stewardship program (OR 0.35, 95% CI 0.15–0.84, p = 0.019) as predictors of inappropriate therapy. Our analysis revealed no evidence of a difference between academic and non-academic centers regarding appropriate usage. Interpretation Our study revealed there to be high levels of antimicrobial usage at German and Austrian pediatric oncology and hematology centers with a significant higher number at academic centers. Incorrect dosing was shown to be the most frequent reason for inappropriate usage. Diagnosis of febrile neutropenia and antimicrobial stewardship programs were associated with a lower likelihood of inappropriate therapy. These findings suggest the importance of febrile neutropenia guidelines and guidelines compliance, as well as the need for regular antibiotic stewardship counselling at pediatric oncology and hematology centers. Funding European Society of Clinical Microbiology and Infectious Diseases, Deutsche Gesellschaft für Pädiatrische Infektiologie, Deutsche Gesellschaft für Krankenhaushygiene, Stiftung Kreissparkasse Saarbrücken

    Novel Arenavirus Sequences in Hylomyscus sp. and Mus (Nannomys) setulosus from Côte d'Ivoire: Implications for Evolution of Arenaviruses in Africa

    Get PDF
    This study aimed to identify new arenaviruses and gather insights in the evolution of arenaviruses in Africa. During 2003 through 2005, 1,228 small mammals representing 14 different genera were trapped in 9 villages in south, east, and middle west of Côte d'Ivoire. Specimens were screened by pan-Old World arenavirus RT-PCRs targeting S and L RNA segments as well as immunofluorescence assay. Sequences of two novel tentative species of the family Arenaviridae, Menekre and Gbagroube virus, were detected in Hylomyscus sp. and Mus (Nannomys) setulosus, respectively. Arenavirus infection of Mus (Nannomys) setulosus was also demonstrated by serological testing. Lassa virus was not found, although 60% of the captured animals were Mastomys natalensis. Complete S RNA and partial L RNA sequences of the novel viruses were recovered from the rodent specimens and subjected to phylogenetic analysis. Gbagroube virus is a closely related sister taxon of Lassa virus, while Menekre virus clusters with the Ippy/Mobala/Mopeia virus complex. Reconstruction of possible virus–host co-phylogeny scenarios suggests that, within the African continent, signatures of co-evolution might have been obliterated by multiple host-switching events
    corecore